.. _system_preparation: Preparing systems for constant-pH simulation ******************************************** Several steps are necessary to prepare a simulation system for simulation using ``protons``. Most of ``protons`` relies on the `OpenMM API for Amber files`_. In order to run constant-pH simulations, the ``protons`` package requires input files generated by several programs from the Ambertools suite. These instructions assume that you have a ``.pdb`` file of your system that has all necessary components to start an MD simulation. If you need help adding missing residues, atoms, or other features to your system, consider reading the OpenMM Modeller_ documentation, or the `Amber manual`_. Using the examples shown below, we will demonstrate how to generate input for an implicit solvent constant-pH simulation, including * An amber topology file (``.prmtop``), which contains the parameters and topology of the system. * A coordinate file (``.inpcrd``), which contains the 3-dimensional coordinates of your system. * A protonation state file (``.cpin``), which enumerate protonation states of amino acids. .. _Modeller: http://docs.openmm.org/7.0.0/userguide/application.html#model-building-and-editing .. _OpenMM API for Amber files: http://docs.openmm.org/7.0.0/userguide/application.html#using-amber-files .. _Amber manual: http://ambermd.org/doc12/Amber16.pdf Obtaining and using Ambertools ============================== Ambertools provides a series of command-line utilities that aid in setting up simulation systems. We will be using these for the manipulation of ``.pdb`` and ``.mol2`` files, and for generating parameters for small molecules using the general Amber force field (GAFF). The instructions below have been tested with Ambertools16. Ambertools16 can be obtained from the `Amber website`_. For any instructions relating to Ambertools programs, we refer to the `Amber manual`_. This includes instructions on how to install Ambertools. After following the installation procedure, please make sure you can access the following programs on your command-line: * ``tleap``, * ``cpinutil.py``, and if you're planning to use small molecules: * ``antechamber``, * ``parmchk``. We will be using these to prepare systems for constant-pH simulation in OpenMM. .. _Amber website: http://ambermd.org/AmberTools16-get.html. Fixing residue names and atoms ============================== Several modifications will need to be made to your input file, in order to rename residues, and prevent issues with preexisting hydrogen atoms. Renaming residues ----------------- One of the first steps of preparing an input file for constant-pH simulation is to rename all residues in the pdb file to their most protonated form. The shortlist for PDB residues that our code supports is as follows: * Histidine (HIP) * Aspartic acid (AS4) * Glutamic acid (GL4) * Cysteine (CYS) * Lysine (LYS) * Arginine (ARG) .. Note:: You may also need to rename any cysteine residues involved in disulfide bonds from ``CYS → CYX``, for further processing steps. Treating hydrogens ------------------ To properly add all necessary hydrogens to the system, it is easiest to first delete all current hydrogens in the system. Otherwise, improperly named hydrogens may cause issues when defining the topology of the system based on the ``.pdb`` file. Missing hydrogens will be added back in the next preparation step, so it is safe to delete them at this stage. An example bash script for processing ``.pdb`` files ---------------------------------------------------- Calling the following shell script on your pdb file should help you prepare your ``.pdb`` file for simulation. It replaces the names of residues in-place in the file, and it removes hydrogens from your file. .. code-block:: bash pdbfile="input.pdb" # Rename to constph residues sed -i 's/HIE/HIP/g' ${pdbfile} sed -i 's/HID/HIP/g' ${pdbfile} sed -i 's/HIS/HIP/g' ${pdbfile} sed -i 's/ASP/AS4/g' ${pdbfile} sed -i 's/GLU/GL4/g' ${pdbfile} # Remove hydrogens (print lines with atom names not starting with H) awk '$3 !~ /^ *H/' ${pdbfile} > tmp && mv tmp ${pdbfile} Tools like ``sed`` and ``awk`` can be very useful for manipulating text files quickly, though I recommend that you verify the output manually afterwards. Writing out coordinate and parameter files using Leap ===================================================== The next step in system preparation is generating the Amber coordinate (``.inpcrd``), and the topology and parameters files (``.prmtop``). This can be done using the command-line utility ``tleap``, which is available as part of ambertools. These instructions will suffice if your system does not include any small molecules. If your system contains a ligand, please see the `next section`_. You can use ``tleap`` interactively on the command line. Just type .. code-block:: bash tleap And you will see output similar to this .. code-block:: c -bash-4.1$ tleap -I: Adding /home/user/bin/../dat/leap/prep to search path. -I: Adding /home/user/bin/../dat/leap/lib to search path. -I: Adding /home/user/bin/../dat/leap/parm to search path. -I: Adding/home/user/bin/../dat/leap/cmd to search path. Welcome to LEaP! (no leaprc in search path) > █ You can start typing your commands line by line. Alternatively, you can store commands in a text file, and then use .. code-block:: bash tleap -f tleap.txt and tleap will run the specified commands automatically. Tleap output can be rather verbose. It is recommended to write the output to file, so you can verify that all steps were executed correctly. Here is a bash example: .. code-block:: bash tleap -f tleap.in >> tleap.out 2>&1 You can rename the ``.out`` file to anything of your choosing. Tleap commands -------------- The following sequence of commands should do for a simple pdb file containing one protein structure. .. code-block:: c # Load constant ph parameters source leaprc.constph # Load the PDB file, rename it to your input file protein = loadPDB input.pdb # Validate the input check protein # Calculate the total charge, for logging purposes charge protein # Write parameters. saveAmberParm protein complex.prmtop complex.inpcrd # Write PDB files, optional savepdb protein complex.pdb # Exit, make sure not to forget this part quit If you need to perform other steps to prepare your system for simulation, please read the `Amber manual`_. Validating tleap results ------------------------ If you run interactively, tleap should provide error messages on screen. The output can be rather verbose, so make sure that your terminal is configured to scroll back far. Alternatively, if you run using an input file, make sure that ``tleap`` ran successfully. I often write output to a log file, and check the log file for errors. Here is a short bash snippet that does the trick. .. code-block:: bash tleap -f tleap.in >> tleap.out 2>&1 # There might be other error clues. This method isn't fail safe. tleap_result=$(grep "usage" tleap.out || grep -i "error" tleap.out) # As long as the grep results are empty if [ -z "$tleap_result" ] then echo -e "\e[32mTleap looks successful. Still, act cautious. She's a slippery one.\e[39m" else echo -e "\e[31mCaught an error in Tleap. Tough luck, buddy.\e[39m" echo $tleap_result fi This procedure generates three different files: * ``complex.prmtop``, an Amber topology file which contains the topology and parameters of the protein system. * ``complex.inpcrd``, a file containing the coordinates of all atoms in the system * ``complex.pdb``, this file is optional. You can use a ``pdb`` file in software such as PyMOL_, to verify that the prepared structure doesn't contain mistakes. You will be needing these to run your OpenMM script. .. _PyMOL: http://pymol.org/ Including ligands in your system ================================ .. _next section: .. warning:: * Ligand support is a work in progress. We've experienced system instability with small molecules in implicit solvent simulations. If you have a ligand, you will have to prepare your ligand using ``antechamber``, and ``parmchk``. This is used to generate two files * ``ligand.gaff.mol2``, a mol2 file with GAFF atom types. * ``ligand.gaff.frcmod``, an frcmod file with GAFF parameters for the ligand. Here is an example of how to run ``antechamber`` and ``parmchk``. .. code-block:: bash antechamber -i ligand.mol2 -fi mol2 -o ligand.gaff.mol2 -fo mol2 parmchk -i ligand.gaff.mol2 -o ligand.gaff.frcmod -f mol2 You may wish to explore the advanced options of ``antechamber`` if you need to generate charges for your ligands. If you want to generate charges in another program, using a ``.mol2`` file should allow you to maintain those charges. Now that you've generated parameters for your ligand, these files then need to be added to your leap setup. Here is an example leap script. .. code-block:: c # Load constant ph parameters source leaprc.constph # Gaff params source leaprc.gaff # Load ligand parameters ligand = loadMol2 ligand.gaff.mol2 loadAmberParams ligand.gaff.frcmod # Load the PDBs protein = loadPDB protein.pdb # Combine into one complex complex = combine { protein ligand } # Validate the input check complex # Calculate the total charge, for logging purposes charge complex # Write parameters. saveAmberParm complex complex.prmtop complex.inpcrd # Write PDB files savepdb protein complex.pdb # Exit, make sure not to forget this part quit .. todo:: * In the current version of the code, ligands can not be treated using constant-pH methodologies. Generating parameters for amino acid protonation states ======================================================= The last step to generate input for the constant-pH simulation is to generate a ``.cpin`` file for your protein. This file contains the parameters for the different protonation states of the amino acids in the system. A ``.cpin`` file can be generated by ``cpinutil.py``, which is also distributed as part of Ambertools. .. code-block:: bash cpinutil.py -resnames HIP GL4 AS4 TYR LYS CYS -p complex.prmtop -o complex.cpin